Automated database analysis to detect malfeasance

ABSTRACT

In various embodiments, systems, methods, and techniques are disclosed for analyzing various entity data items including users, computing devices, and IP addresses, to detect malfeasance. The data and/or database items may be automatically analyzed to detect malfeasance, such as criminal activity to disguise the origins of illegal activities. Various money laundering indicators or rules may be applied to the entity data items to determine a likelihood that money laundering is occurring. Further, the system may determine one or more scores (and/or metascores) for each entity data item that may be indicative of a likelihood that it is involved in money laundering. Scores/metascores may be determined based on, for example, various money laundering scoring criteria and/or strategies. Account entities may be ranked based on their associated scores/metascores. Various embodiments may enable an analyst to discover various insights related to money laundering.

CROSS-REFERENCE TO RELATED APPLICATIONS

Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are hereby incorporated by reference under 37 CFR 1.57.

This application is a continuation of U.S. patent application Ser. No. 14/639,606 entitled “AUTOMATED DATABASE ANALYSIS TO DETECT MALFEASANCE” filed Mar. 5, 2015, which claims benefit of U.S. Provisional Patent Application Ser. No. 62/036,519 entitled “MONEY LAUNDERING DETECTION AND SCORING” filed Aug. 12, 2014. Each of these applications are hereby incorporated by reference herein in their entireties.

This application is related to but does not claim priority from U.S. Provisional Patent Application No. 61/919,653, filed Dec. 20, 2013, U.S. Provisional Patent Application No. 61/952,032, filed Mar. 12, 2014, and U.S. patent application Ser. No. 14/251,485, filed Apr. 11, 2014. Each of these applications are hereby incorporated by reference herein in their entireties and for all purposes.

BACKGROUND

Embodiments of the present disclosure generally relate to data analysis and, more specifically, to detecting and scoring potential money laundering activities.

In financial and security investigations an analyst may have to make decisions regarding data items (e.g., individual pieces of information) within a collection of data. For instance, the analyst may have to decide whether an entity data item represents an entity involved in money laundering. However, an individual entity data item, such as a lone device, financial account, or IP address, oftentimes includes insufficient information for the analyst to make such decisions.

SUMMARY

The systems, methods, and devices described herein each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of this disclosure, several non-limiting features will now be discussed briefly.

In the example noted above, the analyst may make better decisions based upon a collection of related data items. For instance, a computing device may be related to two financial accounts because the same device accesses both accounts. Various systems, such as those disclosed in the following patent applications, may assist the analyst in identifying data items that are directly related to an initial data item.

Docket No. Serial No. Title Filed PALAN.235PR 61/800,887 GENERATING PRIORITIZED Mar. 15, 2013 DATA CLUSTERS WITH CUSTOMIZABLE ANALYSIS STRATEGIES PALAN.235A1 13/968,265 GENERATING Aug. 15, 2013 DATA CLUSTERS WITH CUSTOMIZABLE ANALYSIS STRATEGIES PALAN.235A2 13/968,213 PRIORITIZING Aug. 15, 2013 DATA CLUSTERS WITH CUSTOMIZABLE SCORING STRATEGIES PALAN.235A1P1 14/139,628 TAX DATA CLUSTERING Dec. 23, 2013 PALAN.235A1P2 14/139,603 MALWARE DATA CLUSTERING Dec. 23, 2013 PALAN.235A1P3 14/139,713 USER-AGENT DATA Dec. 23, 2013 CLUSTERING PALAN.235A1P4 14/139,640 TREND DATA CLUSTERING Dec. 23, 2013 PALAN.245PR 61/919,653 ACCOUNT DATA CLUSTERING Dec. 20, 2013 PALAN.245PR2 61/952,032 FRAUD DETECTION AND SCORING Mar. 12, 2014 PALAN.245A 14/251,485 FRAUD DETECTION AND SCORING Apr. 11, 2014 The disclosures of each of the above-noted applications are hereby incorporated by reference herein in their entireties and for all purposes.

For example, the analyst could initiate an investigation with a single suspicious entity, such as a computing device that has been known to access financial accounts involved in money laundering. If the analyst examined this data item by itself, then the analyst may not observe any suspicious characteristics. However, the analyst may request a list of data items related to the initial data item by a shared attribute, such as financial accounts that have also been accessed by the computing device. In doing so, the analyst may discover an additional data item, such as a money transfer transaction between an accessed financial account and an additional financial account. The analyst may then mark the additional financial account as potentially involved in money laundering, based upon the additional data items located through relation to the computing device.

Although these systems can be very helpful in discovering related data items, they typically require the analyst to manually repeat the same series of searches for many investigations. Repeating the same investigation process consumes time and resources, such that there are oftentimes more investigations than can be performed. Thus, analysts typically prioritize investigations based upon the characteristics of the entities (e.g. IP address entities, computing device entities (with Device ID, device fingerprint etc. characteristics), financial account entities, and financial account user entities (with user ID, password, PIN, financial account number, etc. characteristics). However, there may be insignificant differences between the entities, so the analyst may not be able to determine the correct priority for investigations. For instance, the analyst may need to choose between two potential investigations based upon separate IP addresses potentially involved in money laundering. One investigation could reveal more likely money laundering activity than the other, and therefore may be more important to pursue. Yet, the characteristics of the two original entities could be similar, so the analyst may not be able to choose the more important investigation based on characteristics of the two original entities alone. Accordingly, without more information, prioritizing investigations may be difficult and error prone.

According to various embodiments, a data analysis system (“the system”) is disclosed that may automatically detect, based on various entities (e.g. IP addresses, computing devices, financial accounts and/or their users) and their characteristics (e.g., relationships, properties etc.), a likelihood of money laundering and/or malfeasance. For example, the data analysis system analyzing a specific entity (e.g. mobile device) may be used in conjunction with an entity's characteristics (e.g. device ID, time of access to a financial account website by that device ID, geolocation the access occurred from, etc.) to detect likely money laundering activity and/or malfeasance by that entity. Money laundering activity may include, for example, concealment of the origins of illegally obtained money, typically by means of transfers involving foreign banks or legitimate businesses. For example, a money laundering activity may include a user, logging on to a bank's website via a computing device having a specific IP address, to transfer illegal money between a foreign financial account and a US financial account.

In another example, money laundering activity may be more complex. For example, money laundering activity may involve transfer of money along a known drug trade route. In one such scenario, money moving between a bank located in any of Colombia, Venezuela, or Argentina, and then to a bank in any of Mexico, Ecuador, or the Bahamas, and then onto a bank in the United States, may constitute potential money laundering activity. As another example, if there was access to financial accounts from a device or IP address from various physical locations (e.g. a device or IP address being physical located during access to a financial account along the same route—located in any of Colombia, Venezuela, or Argentina, and then located in any of Mexico, Ecuador, or the Bahamas, and then located in the United States), then this activity may constitute potential money laundering activity.

Financial institutions, such as banks, may desire efficient ways to identify such money laundering activities to reduce and/or combat the legitimization of illegal money. Further Financial institutions may be under obligations to comply with regulatory requirements and/or be under scrutiny from governments to prevent or detect such activities. Those entities that exhibit money laundering behaviors and/or may have particular characteristics or patterns of characteristics or activity may be identified by the system.

In various embodiments, the system may comprise one or more computer readable storage devices configured to store one or more software modules including computer executable instructions; a plurality of entity data items; a plurality of entity activity data items, the entity activity data items each associated with one or more respective financial accounts and associated with at least one entity data item of the plurality of entity data items; a plurality of money laundering indicators; and a plurality of money laundering scoring criteria. The system may further comprise one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the computer executable instructions of the one or more software modules in order to cause the computer system to access, from the one or more computer readable storage devices, the plurality of entity data items and the plurality of the entity activity data items. The system may further, for each of the plurality of entity data items, compare the plurality of money laundering indicators to the plurality of entity activity data items associated with the entity data item to determine whether the entity data item satisfies one or more of the money laundering indicators. For each of the determined entity data items satisfying one or more money laundering indicators, the system may determine raw score values for each of the satisfied money laundering indicators associated with the determined entity data item; and based on at least one of the plurality of money laundering scoring criteria, determine a score for the determined entity data item based at least on the determined raw score values of the satisfied money laundering indicators and a quantity of satisfied money laundering indicators, wherein the determined score is indicative of a likelihood that the determined entity data item is associated with money laundering.

In some embodiments, an entity uniquely associated with each entity data item of the plurality of entity data items comprises a user of a financial account, a device, or an IP address. In some embodiments, the plurality of entity activity data items are generated based on web log data. In various embodiments, at least one money laundering indicator of the plurality of money laundering indicators comprises one or more rules configured to determine whether a first entity, uniquely associated with a first entity data item in the plurality of entity data items, accessed a financial account from one or more locations associated with illegal activity. In some embodiments, the one or more locations associated with illegal activity comprises a plurality of distinct locations, and the at least one money laundering indicator requires that the first entity, uniquely associated with the first entity data item, accessed the financial account from at least two distinct locations in the plurality of distinct locations. Similarly, in some embodiments, the at least one money laundering indicator requires that the first entity accessed the financial account from at least three distinct locations in the plurality of distinct locations in a specific order or pattern. In certain embodiments, the specific order or pattern comprises a timing requirement for the first entity accessing the financial account from the at least two distinct locations in the plurality of distinct locations. In some embodiments, the at least one money laundering scoring criteria is configured to determine a score for an entity based on whether the first entity accessed the financial account from locations that are within a threshold range of a registered address associated with the entity.

In some embodiments, a computer-implemented method may comprise, under control of one or more hardware computing devices configured with specific computer executable instructions, enabling communication with one or more computer readable storage devices configured to store: one or more software modules including computer executable instructions; a plurality of entity data items; a plurality of entity activity data items, the entity activity data items each associated with one or more respective financial accounts and associated with at least one entity data item in the plurality of entity data items; a plurality of money laundering indicators; and a plurality of money laundering scoring criteria. The computer-implemented method may further comprise accessing, from the one or more computer readable storage devices, the plurality of entity data items and the plurality of the entity activity data items. The computer-implemented method may further comprise, for each of the plurality of entity data items, comparing the plurality of money laundering indicators to the plurality of entity activity data items associated with the entity data item to determine whether the entity data item satisfies one or more of the money laundering indicators; and for each of the determined entity data items satisfying one or more money laundering indicators: determining raw score values for each of the satisfied money laundering indicators associated with the determined entity data item; and based on at least one of the plurality of money laundering scoring criteria, determine a score for the determined entity data item based at least on a quantity of satisfied money laundering indicators.

In some computer-implemented method embodiments, the entity uniquely associated with each entity data item of the plurality of entity data items comprises a user of a financial account, a device, or an IP address. In some embodiments, a computer implemented method comprises generating the plurality of entity data items based on web log data. In some embodiments, at least one money laundering indicator of the plurality of money laundering indicators comprises one or more rules configured to determine whether a first entity, uniquely associated with a first entity data item in the plurality of entity data items, accessed a financial account from one or more locations associated with illegal activity. In some embodiments of the computer-implemented method, the one or more locations associated with illegal activity comprises a plurality of distinct locations, and the at least one money laundering indicator requires that the first entity, uniquely associated with the first entity data item, accessed the financial account from at least two distinct locations in the plurality of distinct locations. In some embodiments, the at least one money laundering indicator requires that the first entity accessed the financial account from at least three distinct locations in the plurality of distinct locations in a specific order or pattern. In various embodiments, the specific order or pattern comprises a timing requirement for the first entity accessing the financial account from the at least two distinct locations in the plurality of distinct locations. In certain embodiments, the at least one money laundering scoring criteria is configured to determine a score for an entity based on at least one access location of the financial account by the first entity.

In some embodiments, a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a computer system, configure the computer system to perform operations comprising enabling communication with one or more computer readable storage devices configured to store one or more software modules including computer executable instructions; a plurality of entity data items; a plurality of entity activity data items, the entity activity data items each associated with one or more respective financial accounts and associated with at least one entity data item in the plurality of entity data items; a plurality of money laundering indicators; and a plurality of money laundering scoring criteria. The instructions may further configure the computer system to execute accessing, from the one or more computer readable storage devices, the plurality of entity data items and the plurality of the entity activity data items; for each of the plurality of entity data items, comparing the plurality of money laundering indicators to the plurality of entity activity data items associated with the entity data item to determine whether the entity data item satisfies one or more of the money laundering indicators; and for each of the determined entity data items satisfying one or more money laundering indicators: determining raw score values for each of the satisfied money laundering indicators associated with the determined entity data item; and based on at least one of the plurality of money laundering scoring criteria, determine a score for the determined entity data item.

In various embodiments, an entity uniquely associated with each entity data item of the plurality of entity data items comprises a user of a financial account, a device, or an IP address. In some embodiments, the operations further comprise generating the plurality of entity data items based on web log data. In various embodiments, the at least one money laundering indicator of the plurality of money laundering indicators comprises one or more rules configured to determine whether a first entity, uniquely associated with a first entity data item in the plurality of entity data items, accessed a financial account from one or more locations associated with illegal activity.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features of the present invention may be understood in detail, a more particular description of various embodiments, briefly summarized above, may be had by reference to the appended drawings and detailed description. It is to be noted, however, that the appended drawings illustrate only typical embodiments of present disclosure and are therefore not to be considered limiting of its scope, for present disclosure may admit to other equally effective embodiments.

FIG. 1A is a block diagram illustrating an example data analysis system, according to an embodiment of the present disclosure.

FIG. 1B is a flowchart of an example method of the data analysis system, according to various embodiments of the present disclosure.

FIG. 2A is a flowchart of an example of a money laundering data analysis method of the data analysis system, according to various embodiments of the present disclosure.

FIG. 2B is a flowchart of an example of an entity money laundering scoring method of the data analysis system, according to various embodiments of the present disclosure.

FIG. 2C is a flowchart of an example of a clustering method of the data analysis system, according to various embodiments of the present disclosure.

FIG. 2D is a flowchart of an example of a cluster scoring method of the data analysis system, according to various embodiments of the present disclosure.

FIG. 3A is a map illustration of an example of a money laundering indicator/rule, according to various embodiments of the present disclosure.

FIG. 3B is a table listing example money laundering indicators, according to various embodiments of the present disclosure.

FIG. 3C illustrates an example of related money laundering entities and/or growth of a cluster of related money laundering entities, according to an embodiment of the present disclosure.

FIG. 4 illustrates an example money laundering data analysis user interface of the data analysis system, according to an embodiment of the present disclosure.

FIG. 5 illustrates components of an illustrative server computing system, according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS I. Terms

In order to facilitate an understanding of the systems and methods discussed herein, a number of terms are defined below. Each of the terms defined below are broad terms and should be construed in accordance with the ordinary and customary meaning of the terms to a person of ordinary skill of art to which the systems and methods pertain, and should also be construed to include without limitation the descriptions provided below and other implicit meanings to a person of ordinary skill. Thus, neither the definitions below nor the examples, embodiments or illustrations contained herein should be considered limiting with respect to the meaning of these terms or the claims in which they may appear, and such examples, embodiments and illustrations should not be read into or be construed in the claims either by implication or any lack of multiple embodiments.

Ontology: Stored information that provides a data model for storage of data in one or more databases. For example, the stored data may comprise definitions for object types and property types for data in a database, and how objects and properties may be related.

Database: A broad term for any data structure for storing and/or organizing data, including, but not limited to, relational databases (for example, Oracle database, mySQL database, and the like), spreadsheets, XML files, binary files, and text files, among others. The various terms “database,” “data store,” and “data source” may be used interchangeably in the present disclosure.

Data Item (Item), Entity Data Item (Entity), or Data Object (Object): A data container for information representing specific things in the world that have a number of definable properties. For example, a data item may represent an entity such as a financial account (such as bank account, investment account, etc.), a financial account user, a transaction, a person, a place, an organization, a device (such as a server, desktop computer, smartphone or laptop), an activity, a market instrument, an IP address, or other noun. A data item may represent an event that happens at a point in time or for a duration in time. A data item may represent a document or other unstructured data source such as an e-mail message, a news report, or a written paper or article. Each data item may be associated with a unique identifier that uniquely identifies the data item. The data item's attributes (also called “characteristics”, “properties” or “relationships”) may include one or more values of a data item or metadata about the data item, such as a user's user ID or password or affiliated financial account number, a computing device's device ID, fingerprint, etc., or an IP address's associated physical location. A “user” may be considered an individual financial account user, or a company, organization, or other legal entity. The terms “data item,” “data object,” “data entity,” and “entity data item,” may be used interchangeably and/or synonymously in the present disclosure. References to an “entity” in a data or computer system context may refer to an entity data item.

Item (or Object or Entity) Type: Type of a data item. Example data item types may include Account, Device, IP Address, Transaction, Person, Event, or Document. Data item types may be defined by an ontology and may be modified or updated to include additional data item types. A data item definition (for example, in an ontology) may include how the data item is related to other data items, such as being a sub-data item type of another data item type (for example, an agent may be a sub-data item of a person data item type), and the properties the data item type may have.

Entity Activity Data Items: A data item corresponding to an event associated with an entity. Example entity activity data items include data representing access to a financial account, the action taken on a financial account (e.g., determining balance, transferring money to or from the financial account, making a trade, etc.). Entity activity data items may also include entity information (using a property or relationship), such as a device or IP address that has accessed a financial account during a specific event. In some embodiments, some or all entity activity data items may be implemented as properties of entity data items or as links to and/or from entity data items.

Properties: Attributes of a data item and/or other information associated with a data item (also referred to as “characteristics”, “attributes”, “relationships”, or “metadata”). At a minimum, each property of a data item has a property type and a value or values. Properties/metadata associated with data items may include any information relevant to that data item. For example, properties associated with an entity data item (e.g., a data item having an item type of “account”) may include an account number, an associated customer identifier, an opening date, and/or the like. In another example, a person data item may include a name (for example, John Doe), an address (for example, 123 S. Orange Street), and/or a phone number (for example, 800-0000), among other properties. In another example, metadata associated with a computer or device entity data item may include a list of users (for example, user 1, user 2, and the like), and/or an IP (internet protocol) address associated with the device at a specific point in time, a device identifier, a device fingerprint, data characteristics of the device (browsers installed or used, fonts installed, HTTP headers sent or received, etc.) among other properties.

Property Type: The type of data a property is, such as a string, an integer, or a double. Property types may include complex property types, such as a series data values associated with timed ticks (for example, a time series), and the like.

Property Value: The value associated with a property, which is of the type indicated in the property type associated with the property. A property may have multiple values.

Link: A connection between two data items, based on, for example, a relationship, an event, and/or matching properties. Links may be directional, such as one representing a payment from person A to B, or bidirectional.

Link Set: Set of multiple links that are shared between two or more data items.

Seed: One or more data items that may be used as a basis, or starting point, for generating a cluster. A seed may be generated, determined, and/or selected from one or more sets of data items according to a seed generation strategy. For example, seeds may be generated from data items accessed from various databases and data sources including, for example, databases maintained by financial institutions, government entities, private entities, public entities, and/or publicly available data sources, including web logs thereof. In various embodiments of the present disclosure, seed data items may include entities (such as a device, financial account, or IP address) satisfying a money laundering score threshold, as described below.

Cluster: A group or set of one or more related data items. A cluster may be generated, determined, and/or selected from one or more sets of data items according to a cluster generation strategy. A cluster may further be generated, determined, and/or selected based on a seed. For example, a seed may comprise an initial data item of a cluster. Data items related to the seed may be determined and associated with or added to the cluster. Further, additional data items related to any clustered data item may also be added to the cluster iteratively as indicated by a cluster generation strategy. Data items may be related by any common and/or similar properties, metadata, types, relationships, and/or the like.

Seed Generation Strategy and Cluster Generation Strategy: Seed and cluster generation strategies (which may comprise one or more “indicators” (also referred to as rules)) indicate processes, methods, and/or strategies for generating seeds and generating clusters, respectively. In various embodiments of the present disclosure, processes and methods of applying money laundering indicators to activity data to determine entities likely to be related to money laundering may be referred to as seed generation strategies. In an example, a seed generation strategy may indicate that entities having a particular property (for example, computing devices satisfying a money laundering score threshold), or entities that exhibit specific patterns of behavior across one or multiple data sources, are to be designated as seeds. In another example, a cluster generation strategy may indicate that data items having particular properties in common with (or similar to) a seed or other data item in a cluster are to be added to the cluster. Seed and/or cluster generation strategies may specify particular searches and/or rule matches to perform on one or more sets of data items. Execution of a seed and/or cluster generation strategy may produce layers of related data items. Additionally, a seed or cluster generation strategy may include multiple strategies, sub-strategies, rules, and/or sub-rules.

Cluster Scores: Values determined by various characteristics and/or attributes associated with the cluster and/or the various data items of the cluster.

Cluster Metascores: An overall cluster score. Cluster metascores may, for example, be based on a combination of cluster scores of a cluster associated with a seed.

II. Overview

As mentioned above, according to various embodiments, a data analysis system (the “system”) is disclosed that may automatically detect, based on various entity activity data (e.g. account usage activity such as, activity logs of financial account users, computing devices, and IP addresses, among other entities accessing one or more financial accounts), a likelihood of malfeasance. The data and/or database items may be automatically analyzed to detect malfeasance, such as criminal activity to disguise the origins of illegal activities. For example, the data analysis system may be used in conjunction with financial user and device related data items to detect money laundering activity. Money laundering activity may include, for example, a person or group engaging in financial activity to illegally transfer money or disguise the source of money. Examples of money laundering may include transfers of money, or obfuscation of the source of money, so as to transform proceeds of illegal activity into ostensibly legitimate money or legitimate assets. Money laundering may also include any misuse of the financial system (usually performed via money transfer), including terrorism financing, tax evasion, international sanction evasion, import and/or export restrictions evasion, or other prohibited activities. Money laundering may include avoiding prohibitions that may include the source of a transfer (e.g. money from illegal crime activities such as extortion, insider trading, drug trafficking, illegal gambling, tax evasion etc.), or the destination of a transfer (e.g. money destined for a terrorist organization or to be imported to a specific country such as the US). Money laundering may also include the layering, integrating, or combining of illegal funds with legitimate funds, so as to conceal the source of funds. Money laundering may also include, in addition to money transfers, the purchase of assets that can conceal a source, such as securities, digital currencies (e.g. bitcoin), credit cards, cash, coupons, virtual currency, on-line game credits, etc.

Money Laundering may take many specific forms, including bank methods, smurfing (also known as structuring), currency exchanges, and double-invoicing. Each type of money laundering can be detected with appropriately crafted money laundering indicator(s) that can be crafted to detect financial account users, computing devices (mobile/non-mobile) and IP addresses associated with money laundering activity. Several example types of money laundering are listed below.

Cross Nation Access (also referred to as “Cross Nation Transfers”): Deposits into and/or from a bank or other financial account, where the deposits and withdrawals at least are partially sourced from a prohibited activity or entity, or destined for a prohibited activity or entity. Patterns of cross nation accesses may indicate such prohibited activities and indicate money laundering (e.g. a device accessing a financial account in locations according to a known pattern (such as a drug trade route), or specific money transfers between back accounts in different countries and/or jurisdictions according to a known money laundering route).

Structuring (also referred to as “smurfing”): a method of moving money whereby cash deposits are broken into smaller deposits of money, that are used to reduce suspicion of money laundering and to avoid anti-money laundering reporting requirements. A sub-component of this is to use smaller amounts of money to purchase bearer instruments, such as money orders, and then ultimately deposit those, again in small amounts.

Bulk cash smuggling: This involves smuggling money to another jurisdiction and depositing it in a financial institution, such as an offshore bank, with greater bank secrecy or less rigorous money laundering enforcement.

Cash-intensive businesses: In this method, a business typically involved in receiving cash uses its accounts to deposit both legitimate and criminally derived cash, claiming all of it as legitimate earnings. Service businesses are best suited to this method, as such businesses have no variable costs, and it is hard to detect discrepancies between revenues and costs.

Trade-based laundering: This involves under- or overvaluing invoices to disguise the movement of money.

Shell companies and trusts: Trusts and shell companies disguise the true owner of money. Trusts and corporate vehicles, depending on the jurisdiction, need not disclose their true, beneficial, owner.

Round-tripping: Money is deposited in a controlled foreign corporation offshore account, preferably in a tax haven where minimal records are kept, and then shipped back as a foreign direct investment, exempt from taxation. A variant on this is to transfer money to a law firm or similar organization as funds on account of fees, then to cancel the retainer and, when the money is remitted, represent the sums received from the lawyers as a legacy under a will or proceeds of litigation.

Bank capture: In this case, money launderers or criminals buy a controlling interest in a bank, preferably in a jurisdiction with weak money laundering controls, and then move money through the bank without scrutiny.

Casinos: In this method, an individual walks into a casino with cash and buys chips, plays for a while, and then cashes in the chips, taking payment in a check, or just getting a receipt, claiming it as gambling winnings. Alternatively, money can be spent on gambling, preferably on higher odds. The wins are shown if the source for money is asked for, while the losses are hidden.

Real estate: An entity purchases real estate with illegal proceeds and then sells the property. To outsiders, the proceeds from the sale look like legitimate income. Alternatively, the price of the property is manipulated, such as by the seller agreeing to a contract that underrepresents the value of the property and receiving criminal proceeds to make up the difference.

Black salaries: A company may have unregistered employees without a written contract and pay them cash salaries. Dirty money might be used to pay them.

Tax amnesties: For example, those that legalize unreported assets in tax havens and cash

Fictional loans: A loan might be given to legitimately transfer money from a bank that has money to launder. The loan may or may not be defaulted on.

Anti-money laundering (AML) is a term mainly used in the financial and legal industries to describe the legal controls that require financial institutions and other regulated entities to prevent, detect, and report money laundering activities. Anti-money laundering guidelines came into prominence globally as a result of the formation of the Financial Action Task Force (FATF) and the promulgation of an international framework of anti-money laundering standards. The systems and methods described herein may, in an automatic fashion, or in a semi-automated fashion, assist analysts in detecting potential money laundering (or money laundering related anomalies) in order to satisfy an institution's AML obligations.

Such anomalies may include substantial increases in funds in a financial account, a large withdrawal, or moving money to a bank secrecy jurisdiction. They may also include detecting a transfer via an account user, device, or IP address on a black list, or involve a pattern of transactions having particular characteristics. Additional rules/indicators for detecting money laundering are described herein.

In various embodiments, as also mentioned above, the system may receive and/or access various data items relating to entity activity related data (e.g. financial account usage, user, or device activity) related to the transfer of money or securities. The entity activity related data may be associated with one or more bank or financial account data item and may, for example, include web log data (including a device, user, and/or IP address related to mobile device or web traffic activity with a financial institution), financial transaction data and/or various other user, device, or IP address based data.

The system may further receive and/or access various money laundering indicators or rules. Based on the money laundering indicators (which may also be referred to as rules) and the entity activity related data, the system may determine a likelihood that an entity data item is involved in money laundering, or associated with a money laundering operation. Further, the system may determine one or more scores (and/or metascores) for each entity data item that may be indicative of a likelihood an entity associated with the data item is involved in money laundering, or associated with money laundering. Scores and/or metascores may be determined based on, for example, various money laundering scoring criteria and/or strategies. As described in further detail below, entity data items may be ranked based on their associated scores and/or metascores.

Entity activity data, data items, and/or other types of data items of the data analysis system may be accessed and/or received from various databases and data sources including, for example, databases maintained by financial institutions, government entities, private entities, public entities, and/or publicly available data sources. Such databases and data sources may include a variety of information and data, such as, for example, personal information, financial information, tax-related information, web log data, web service data, mobile device usage data, mobile app log data, computer network-related data, and/or any computer-related activity data, among others. Further, the databases and data sources may include various relationships that link and/or associate data items with one another. Various data items and relationships may be stored across different systems controlled by different entities and/or institutions. According to various embodiments, the data analysis system may bring together data from multiple data sources in order to build clusters.

In various embodiments the system may further cluster the entity data items with various other related data items. Clusters of related data items may be generated from initial data items, called “seeds.” In various embodiments, seeds may comprise entity data items determined to have a high likelihood of being involved in money laundering. The processes and methods of applying money laundering indicators to entity activity data to determine entities likely to be involved in money laundering may be referred to as seed generation strategies.

Additionally, in various embodiments the system may score and/or rank entities and/or clusters of entity data items. Further, as described below, in various embodiments entity data items may be clustered in order to score and/or assess a likelihood of money laundering. For example, a particular entity data item may be used by the system as a seed even when the entity data item and/or associated entity is not initially determined to have a high likelihood of being involved in money laundering. In this example, after clustering it may be determined by the system, based on the entity data item's relationship with various other data items, that it likely is involved in money laundering.

In various embodiments, the data analysis system may enable a user to efficiently perform analysis and investigations of various data items (for example, entity data items) and/or clusters of data items. For example, the system may enable a user (also referred to herein as an “analyst”) to perform various financial, device based, and security investigations related to an entity and/or a seed (for example, an initial entity data item or data object determined to likely be involved in money laundering). In such an investigation, the system may enable an analyst to search and/or investigate several data items and/or several layers of related data items. For example, a financial account user may be a seed that is linked by the system to various data items including, for example, an IP address or computing device that regularly accesses the financial account's web or mobile interface. Further, the system may link, for example, various other computing devices or IP addresses that also access the same account or use the same user ID at a financial institution, to the seed financial account user. Accordingly, in various embodiments, the system may automatically determine and provide to a user or analyst various layers of data items related to the seed financial account. Such an investigation may, in an embodiment, enable the analyst to determine money laundering. For example, if the seed financial account user was suspected to be involved in money laundering, then the analyst may determine that the additional computing devices may also be involved in money laundering. Further, if the seed financial account user was linked to other known entities involved in money laundering, the analyst may determine that the seed financial account user was likely to be involved in money laundering. As mentioned above, in such an investigation the analyst may discover relationships between the additional entities and the seed financial account user through several layers of related data items. Such techniques, enabled by various embodiments of the data analysis system, may be particularly valuable for investigations in which relationships between data items may include several layers, and in which such relationships may be otherwise very difficult or impossible to manually identify.

In various embodiments, the data analysis system may automatically generate, or determine, seeds based on a seed generation strategy (also referred to as “seed generation rules”). For example, for a particular set of data items, the data analysis system may automatically generate, based on a seed generation strategy, seeds by designating particular data items (or groups of data items) as seeds. Examples of various seed generation strategies are described below. In various embodiments, processes and methods of applying money laundering indicators to entity activity data to determine entities likely to be involved in money laundering may be referred to as seed generation strategies/rules.

Further, in various embodiments, the data analysis system may automatically discover data items related to a seed, and store the resulting relationships and related data items together in a “cluster.” A cluster generation strategy (also referred to as “cluster generation rules”) may specify particular searches to perform at each step of an investigation, or cluster generation, process. Such searches may produce layers of related data items to add to the cluster. Thus, according to an embodiment, an analyst may start an investigation with the resulting cluster, rather than the seed alone. Starting with the cluster, the analyst may form opinions regarding the related data items, conduct further analysis of the related data items, and/or may query for additional related data items.

According to various embodiments, the data analysis system may further generate various “cluster scores.” Cluster scores may include scores based on various characteristics and/or attributes associated with the cluster and/or the various data items of the cluster. In various embodiments, the data analysis system may also generate “cluster metascores” which may include, for example, an overall cluster score. Cluster metascores may, for example, be based on a combination of cluster scores of a cluster associated with a seed. The processes of scoring entity data items for money laundering, and scoring clusters of entity activity data items, may be similar and the description of either (below) may also apply to the other.

Further, in various embodiments, for a particular set of data items, multiple clusters may be generated by the data analysis system. For example, the data analysis system may generate multiple seeds (for example, the corresponding entity data items for identified financial accounts, computing devices, IP addresses, etc.) according to a seed generation strategy, and then multiple clusters based on those seeds (and based on a cluster generation strategy). In such embodiments, the data analysis system may prioritize the multiple seeds and/or generated clusters based upon money laundering and/or cluster scores and/or cluster metascores. In an embodiment, the data analysis system may provide a user interface including a display of summaries of the entities and/or clusters, including scores, metascores, and/or various other information. Such summaries may be displayed according to a prioritization of entities and/or clusters. In various embodiments, such prioritization may assist an analyst in selecting particular entities and/or clusters to investigate.

In the following description, numerous specific details are set forth to provide a more thorough understanding of various embodiments of the present disclosure. However, it will be apparent to one of skill in the art that the systems and methods of the present disclosure may be practiced without one or more of these specific details.

III. Examples of Data Items, Properties, and Links

In various embodiments, different types of data items may have different property types. For example, a “financial account” data item may have an “Account Number” property type, a “Person” data item may have an “Eye Color” property type, a “Device” data item may have a “browser” or “browser version” property type, and an “Event” data item may have a “Date” property type. Each property as represented by data in a database may have a property type defined by an ontology used by the database. Further, data items may be instantiated in a database in accordance with a corresponding object definition for the particular data item in the ontology. For example, a specific monetary transaction or payment (for example, an entity of type “event”) of US$30,000.00 (for example, a property of type “currency” having a property value of “US$30,000.00”) taking place on 6/30/2012 (for example, a property of type “date” having a property value of “6/30/2012”) may be stored in the database as an event object with associated currency and date properties as defined within the ontology.

Data items defined in an ontology may support property multiplicity. In particular, a data item may be allowed to have more than one property of the same property type. For example, a “Device” data item may be associated with multiple financial accounts (“financial account” properties), locations (“Geolocation” properties) or IP addresses (“IP Address” properties), or a “Person” data object may have multiple “Address” properties or multiple “Name” properties.

A link represents a connection between two data items and may be through any of a relationship, an event, and/or matching properties. A link may be asymmetrical or symmetrical. For example, “Person” data item A may be connected to “Person” data item B by a “Child Of” relationship (where “Person” data item B has an asymmetric “Parent Of” relationship to “Person” data item A), a “Kin Of” symmetric relationship to “Person” data item C, and an asymmetric “Member Of” relationship to “Organization” data item X. The type of relationship between two data items may vary depending on the types of the data items. For example, “Person” data item A may have an “Appears In” relationship with “Document” data item Y or have a “Participate In” relationship with “Event” data item E. As an example of an event connection, two “Person” data items may be connected by an “Airline Flight” data item representing a particular airline flight if they traveled together on that flight, or by a “Meeting” data item representing a particular meeting if they both attended that meeting. In one embodiment, when two data items are connected by an event, they are also connected by relationships, in which each data item has a specific relationship to the event, such as, for example, an “Appears In” relationship.

As an example of a matching properties connection, two “Device” data items may both have an “IP Address” property that indicates an IP address associated with the devices, and that IP address may be the same for both, or may be on the same network or subnet, even if different. In another example, two “IP Address” data items may both have a “Geolocation” property that indicates that these addresses are assigned within the same physical location. If the locations are within the same general area, then their “Geolocation” properties likely contain similar, if not identical property values. In one embodiment, a link between two data item may be established based on similar or matching properties (for example, property types and/or property values) of the data item. These are just some examples of the types of connections that may be represented by a link and other types of connections may be represented; embodiments are not limited to any particular types of connections between data items. For example, a web log may contain references to two different entities. For example, the web log may contain a reference to a source financial account (one data item), and a destination financial account (a second data item). A link between these two data items may represent a connection between these two entities through their co-occurrence within the same web log.

In some embodiments, each line of a web log, or any collection of data indicating how a financial account was be used such as a log/data created by a bank's mobile application (all of which may be referred to as comprising account usage data items or entity activity data items (e.g. log lines, entries, and/or records)), can comprise multiple identifiers in a single log line. For example, a single log line may include a time stamp, the accessed financial account, the user accessing the financial account, the device accessing the financial account (represented via a device ID/fingerprint/or one or more multiple characteristics of a device that allow for a unique or near-unique identification of a device), the IP address of the device accessing the financial account, and any action, activity, or function initiated by the user or device on the bank's website or mobile application.

Each data item may have multiple links with another data item to form a link set. For example, two “Device” data items may be linked through a “geolocation” relationship, a matching “time zone” property, and/or one or more matching “Event” properties (for example, accesses the same financial account). Each link, as represented by data in a database, may have a link type defined by the database ontology used by the database.

In various embodiments, the data analysis system may access various data items and associated properties from various databases and data sources. Such databases and data sources may include a variety of information and data, such as, for example, personal information (for example, names, addresses, phone numbers, personal identifiers, and the like), financial information (for example, financial account information, transaction information, balance information, and the like), tax-related information (for example, tax return data, and the like), device related data, computer network-related data (for example, network traffic information, IP (Internet Protocol) addresses, user account information, domain information, network connection information, web logs (including financial account access logs) and the like), and/or computer-related activity data (for example, computer events, user actions, money laundering black lists or grey lists and the like), entity histories, previous investigations, among others.

IV. Description of the Figures

Embodiments of the disclosure will now be described with reference to the accompanying Figures, wherein like numerals refer to like elements throughout. The terminology used in the description presented herein is not intended to be interpreted in any limited or restrictive manner, simply because it is being utilized in conjunction with a detailed description of certain specific embodiments of the disclosure. Furthermore, embodiments of the disclosure may include several novel features, no single one of which is solely responsible for its desirable attributes or which is essential to practicing the embodiments of the disclosure herein described.

a. Example Data Analysis System

FIG. 1 is a block diagram illustrating an example data analysis system 100, according to an embodiment. As shown in the embodiment of FIG. 1, the data analysis system 100 includes an application server 115 running on a server computing system 110, a client 135 running on a client computer system 130, and at least one data store 140. Further, the client 135, application server 115, and data store 140 may communicate over a network 150, for example, to access data sources 160.

The application server 115 may include a rules engine 120 and a workflow engine 125. The rules engine 120 and a workflow engine 125 may be software modules as described below in reference to FIG. 5. According to an embodiment, the rules engine 120 is configured to apply one or more money laundering seed generation strategies (e.g., including one or more money laundering rules or indicators) to various entity data items, score entity data items based on money laundering scoring criteria (e.g., as defined in the one or more seed generation strategies), build one or more clusters of related data items according to one or more cluster generation strategies, and/or score clusters based on cluster scoring criteria (e.g., as defined in the one or more seed generation strategies). The rules engine 120 may read data from a variety of data sources 160 to obtain entity data items, entity activity data items, obtain other types of data items, and/or generate clusters from seeds (also referred to as “seed data items”). Once processed, entity data items (and other types of data items) and/or clusters may be stored on the server computing system 110 and/or on the data store 140. The operations of the rules engine 120 are discussed in detail below in reference to FIGS. 2A-2D and 3A-3C.

As mentioned, in an embodiment, the rules engine 120 may be configured to score the entity data items and/or data item clusters, according to a money laundering scoring criteria and/or cluster scoring criteria (e.g., as defined in a cluster scoring strategy). A score may indicate a likelihood that an entity or cluster is involved or otherwise associated with money laundering and/or an importance of further analyzing an entity or cluster. For example, with respect to a cluster likely to be associated with money laundering, including several financial accounts, the rules engine 120 may execute a scoring strategy that aggregates the account balances of the financial accounts within the cluster because, for example, a large aggregated total balance may indicate a large money laundering liability for a financial institution, a cluster with such a large total balance may be considered to have a higher score relative to other clusters with lower aggregated total balances (and, therefore, lower scores). Thus, a cluster with a higher score relative to a cluster with a lower score may be considered more important to analyze.

In an embodiment, the rules engine 120 organizes and presents the entities and/or clusters according to the assigned scores. The rules engine 120 may present summaries of the entities and/or clusters, and/or interactive representations of the clusters within an analysis user interface. For example, the representations may provide visual indications (for example, graphs or other visualizations) of entity data items and/or related data items within various clusters. The rules engine 120 may generate an analysis user interface, such as a web application and/or a dynamic web page displayed within the client 135. The rules engine 120 may also allow an analyst to create tasks associated with the entities and/or clusters. In an embodiment, the rules engine 120 evaluates entities and/or generates clusters automatically, for example, for subsequent review by analysts.

The application server may further include a workflow engine 125. The workflow engine 125 may generate and/or provide the various user interfaces of the data analysis system. Analysts may assign tasks to themselves or one another via a workflow user interface generated by the workflow engine 125, for example. In another example, the workflow engine 125 may present various data generated by the rules engine 120. For example, the workflow engine 125 may present an analyst with entities and/or clusters generated, scored, and ordered by the rules engine 120.

The client 135 may represent one or more software applications or modules configured to present data and translate input, from the analyst, into requests for data analyses by the application server 115. In one embodiment, the client 135 and the application server 115 may be embodied in the same software module and/or may be included in the same computing system. However, several clients 135 may execute on the client computer 130, and/or several clients 135 on several client computers 130 may interact with the application server 115. In one embodiment, the client 135 may be a browser accessing a web service. In various embodiments, a component of the system, for example the workflow engine 125, may generate user interfaces (for example, that may be transmitted to a display or browser and displayed to an analyst) and/or may generate instructions or code useable to generate a display and/or user interface (for example, that may be transmitted to a display or browser where a user interface may be generated and displayed to an analyst).

While the client 135 and application server 115 are shown running on distinct computing systems, the client 135 and application server 115 may run on the same computing system. Further, the rules engine 120 and the workflow engine 125 may run on separate applications servers 115, on separate server computing systems, or some combination thereof. Additionally, a history service may store the results generated by an analyst relative to a given entity and/or cluster

In an embodiment, the data sources 160 provide data available to the rules engine 120, for example entity activity data, which may include financial account usage data. The data sources 160 may provide various data items such as entity activity data items to the rules engine 120 for determination of likely entities involved with money laundering (for example, generation of seeds) and/or to create or generate clusters from a seed or a set of seeds. Such data sources may include relational data sources, web services data, XML data, and the like. Further, such data sources may include a variety of information and data, for example, web log information (including access logs and network interactions of devices and/or IP addresses initiating actions on accounts, such as a bank transfer) personal information, financial information, tax-related information, computer network-related data, and/or computer-related activity data, device, IP address, or money laundering black or grey lists, entity investigation histories, entity money laundering histories, among others. For example, the data sources may be related to financial account records stored by a financial institution, and operations on those accounts (such as bank transfers) by users, devices and/or IP addresses. In such a case, the data sources may include entity activity data, such as financial operations taken by a user, device or IP address (deposits, transfers, purchases, withdrawals, etc,), location data for the devices or IP addresses (or a lookup service to determine such a device's or IP address's location via geolocation of an IP address, device time zone, or other means), financial account data, customer data, and transaction data. The entity activity data may include data attributes such as device characteristics, HTTP headers/user agent information, IP addresses, account numbers, account balances, phone numbers, persons, addresses, transaction amounts, and/or the like. Of course, data sources 160 are included to be representative of a variety of data available to the server computer system 110 over network 150, as well as locally available data sources.

The data store 140 may be a Relational Database Management System (RDBMS) that stores the data as rows in relational tables. The term “database,” as used herein, may refer to an database (e.g., RDBMS or SQL database), or may refer to any other data structure, such as, for example a comma separated values (CSV), extensible markup language (XML), text (TXT) file, flat file, spreadsheet file, and/or any other widely used or proprietary format. While the data store 140 is shown as a distinct computing system, the data store 140 may operate on the same server computing system 110 as the application server 115.

Additional details of the server computing system 110, the data sources 160, and other components of the data analysis system are described below in reference to FIG. 5.

b. Example Methods of the Data Analysis System

FIG. 1B is a flowchart of an example generalized method of the data analysis system, according to various embodiments of the present disclosure. In various embodiments, fewer blocks or additional blocks may be included in the process, or various blocks may be performed in an order different from that shown in the figure. In an embodiment, one or more blocks in the figure may be performed by various components of the data analysis system, for example, server computing system 110 (described in reference to FIG. 1 above and FIG. 5 below).

The flowchart of FIG. 1B shows an overview of various processes of the data analysis system, the details of which are described in reference to the flowcharts of FIGS. 2A-2B below. By the method of FIG. 1B, the system may detect and/or identify potential computing devices, IP addresses, and financial accounts (or users/user IDs associated with such financial accounts) associated with money laundering activity.

At block 172, the system may determine one or more entities and their corresponding entity data items (e.g., representing entities or other data items) satisfying various money laundering indicators. Details of an example of this process are described below in reference to FIG. 2A. At block 174, the system may determine one or more money laundering scores and/or metascores for each of the entity data items corresponding to the entities. Details of an example of this process are described below in reference to FIG. 2B. At optional block 176, the system may rank the entity data items corresponding to the entities based on the determined money laundering scores. An example of ranked entities is shown in an example user interface in FIG. 4. At optional block 178, the system may generate clusters of related data items based on the determined entity data items likely involved with money laundering, or associated with money laundering as determined by the data analysis system based on the money laundering scores for each entity in block 174. Details of an example of this process are described in reference to FIG. 2C. At optional block 180, the system may determine one or more cluster scores and/or metascores for each of the clusters. Details of an example of this process are described below in reference to FIG. 2D. At optional block 182, the system may rank the clusters based on the determined cluster scores and/or metascores. An example of ranked clusters is also shown in the example user interface of FIG. 4. As described, each of blocks 176-182 are optional, and in some embodiments may not be performed by the system.

In various embodiments, the data analysis system may or may not generate multiple scores for each entity and/or cluster, may or may not generate metascores for each entity/cluster, and/or may or may not rank the entities/clusters. In an embodiment, the system may rank entities/clusters based on one or more scores that are not metascores.

Further, as described below, in an embodiment entity data items satisfying a money laundering score threshold may be used as seeds for generating data item clusters. Alternatively, data item clusters may be generated for any determined entity/entities. Seeds may include one or multiple data items. Similarly, the clusters may include one or multiple data items related to a seed, including the seed, and may be generated based on cluster strategies or rules (also referred to as “cluster generation strategies”). Scores and metascores may be determined based on attributes, characteristics, and/or properties associated with an entity data item and/or data items that make up a given cluster.

In various embodiments, in each of the flowcharts described below in reference to FIGS. 2A-2D, fewer blocks or additional blocks may be included in the example processes depicted, or various blocks may be performed in an order different from that shown in the figures. Further, in various embodiments, one or more blocks in the figures may be performed by various components of the data analysis system, for example, server computing system 110 (described above in reference to FIGS. 1 and 5) and/or another suitable computing system.

FIG. 2A is a flowchart of an example of an entity activity data analysis method of the data analysis system in which entity activity data is analyzed based on one or more money laundering indicators or rules, according to various embodiments of the present disclosure. The flowchart of FIG. 2A may correspond to block 172 of the flowchart of FIG. 1B. As described below, the one or more money laundering indicators, when applied to data related to an entity data item, may indicate a likelihood that the entity data item is involved in or associated with money laundering. For example, an entity satisfying one or more of the money laundering indicators may indicate that the entity is potential involved in money laundering activity. This is because, at least in part, entities involved in money laundering may exhibit particular behaviors, and/or may have particular characteristics, that may be identified by the system by application of the money laundering indicators.

As described above, money laundering activity may include, for example, a person or group engaging in financial activity to illegally transfer money or disguise the source of money. Examples of money laundering may include transfers of money, or obfuscation of the source of money, so as to transform proceeds of illegal activity into ostensibly legitimate money or legitimate assets. Money laundering may also include any misuse of the financial system (usually performed via money transfer), including terrorism financing, tax evasion, international sanction evasion, import/export restrictions evasion, or other prohibited activities. Money laundering may include avoiding prohibitions that may include the source of a transfer (e.g. money from illegal crime activities such as extortion, insider trading, drug trafficking, illegal gambling, tax evasion etc.), or the destination of a transfer (e.g. money destined for a terrorist organization or to be imported to a specific country such as the US). Money laundering may also include the layering, integrating, or combining of illegal funds with legitimate funds, so as to conceal the source of funds. Money laundering may also include, in addition to money transfers, the purchase of assets that can conceal a source, such as securities, digital currencies (e.g. bitcoin), credit cards, cash, coupons, virtual currency, on-line game credits, etc. Money Laundering may include cross nation access/transfers, smurfing, currency exchanges, double-invoicing, structuring, bulk cash smuggling, cash-intensive businesses, trade-based laundering, shell companies and trusts, round-tripping, bank capture, casinos, real estate, tax amnesties, fictional loans, etc.

Advantageously, according to various embodiments, the system may enable analysts of, for example, financial institutions such as banks, to efficiently identify entities that are (or were) potentially involved in money laundering (for example, computing devices that access financial accounts, where those computing devices all meet a specific criteria, such as a pattern of movement of money between physical locations of the computing devices).

Referring again to the flowchart of FIG. 2A, at block 212, entity activity data and/or data items may be received and/or accessed by the data analysis system. For example, entity activity data items may include web logs (212 a), mobile application data (212 b), any other device related data (212 c), and or any other financial entity activity data (such as bank transfers or financial account accesses) and/or other user-related data (212 d), or other data sources which may provide a signal to detect money laundering. Such entity activity data may be accessed and/or received from, for example, various data sources 160 (as shown in FIGS. 1 and 5). In various embodiments, entity activity data may be accessed from various data sources, as described herein. Further, entity activity data accessed from the various data sources may be normalized and/or adapted such that it may be read and/or manipulated by the system (as described below in reference to FIG. 5).

Web logs (212 a) may comprise multiple log lines describing accesses/network transactions between a browser client and a banking or financial server. Each log line may comprise a timestamp of an HTTP/HTTPS request, a source IP address, a username of a customer/person accessing the financial website, an account being accessed, and/or a device ID or fingerprint (such as an active directory token, persistent cookie, hash ID, or one or more multiple characteristics providing enough entropy so as to uniquely or identify or classify a device). The web log may also indicate specific actions taken by devices/users on a website, such as access to web pages or functions that indicate a bank transfer, withdrawal, or deposit. Multiple logs may be used to stitch together a full understanding of actions on a website (such as correlating device/cookie IDs in a web log with database logs indicating user actions during similar time periods and/or using the same IP address, etc.), and such a collection of data may be considered a data source (or data sources) used by the data analysis system.

Mobile Application Data (212 b) may comprise data locally stored by a mobile application and sent over a network. For example, many banking institutions have their own mobile applications to access financial accounts. These mobile applications may run on many devices, including mobile android devices, iOS devices, etc. As mobile applications may allow a customer to access a financial institutions accounts, these applications may collect information about the mobile device and its interactions. Such information may comprise similar data as the web log data (212 a), but may also include other information, such as the geolocation of the mobile device while accessing the mobile application (e.g., GPS coordinates, travel speed, accelerometer information, time zone/clock setting information, etc.) Any information collected by the mobile application on the mobile device may then be sent to a server or database, to be accessed as a data source (or data sources)

Other device related data (212 c) may include a variety of information, such as device black lists or histories of activities for known devices (e.g., tracked by fingerprint, characteristics, cookie, etc.). Additionally, such data may be generated on the fly. For example, each IP address used by a device, or appearing in a log, may have an associated approximate geo-location. A network service may be used to analyze each IP address associated with a device to determine the approximate geolocation of a device during a specific time period each IP address was used, during a specific transaction, etc. In some embodiments, a network service need not be used. Instead, a local table or database may be consulted that may map IP addresses to specific geo locations in order to determine an IP addresses or devices location. Such a lookup may be a part of preprocessing of the data sources 212 a, 212 b, 212 c, and 212 d, or may be performed on the fly by the data analysis system.

Other user related data sources (212 c) may include lists of financial accounts, their associated users/customers, known financial accounts associated with money laundering, customer black lists, money laundering investigation histories, or other data. A customer black list may comprise a list of known persons or business entities that have been designated as associated with money laundering, and should be avoided by all banks. The list may comprise personally identifiable information (e.g. a Person object's properties or known devices of the Person) that can be used for detecting relationships to other data to determine other associated entities (e.g., a device that may have been used by a black listed person).

Each of the above data sources may comprise one or more entity activity data items including account usage information, as each data source usually includes multiple entries (e.g. log lines) indicating a relationship of each entity with a financial account (e.g. using variable placeholders, entity X at time T accessed account Z from IP A.B.C.D to perform account function W).

In some embodiments, the entities examined by the data analysis system may be generated by extracting entities from a portion of, or all of, the data sources comprising entity activity data items. For example, the data analysis system may extract device identifiers, user identifiers, financial accounts, and IP addresses from a bank's web logs or transaction logs, in order to create a list of a plurality of entities to examine.

At block 214, the system may determine one or more entities with which the received data are related. For example, the received entity activity data may include various devices IDs (fingerprints, HTTP characteristics, etc.), IP addresses, customers, transactions, phone numbers, addresses, person identifiers, and/or the like that may each be related to one or more entities. The system may determine which data items are related with which entities such that the entities may be evaluated for potential money laundering, as described below. For example, for each device or IP address, the data analysis system may look up or determine the device's physical location for one or more time periods based on a web log entry, mobile application data, a geolocation mapping, and/or a geolocation lookup service.

At block 216, various money laundering indicators or rules may be received and/or accessed by the data analysis system. Money laundering indicators may include, for example, various data, rules, and/or criteria that may be compared with the entity activity data items (for example, device/IP address bank web service/mobile application transaction information) to determine whether an entity is likely involved in money laundering.

In some embodiments, a money laundering indicator may use physical location of access by an entity to a financial account. As described herein, the data analysis system can examine an entity's activity through the use of data sources comprising web logs or mobile application logs of a bank interaction with customer devices. The logs may provide multiple access times for each device, the customer associated with access (such as a user ID that is affiliated with the financial account and associated with a user/customer profile), a device fingerprint (such as an Active Directory token, persistent cookie, browser characteristics, etc.), a device's location during the activity, and/or an IP address. In some embodiments, geolocation data for activity may be discovered by looking up an IP addresses in a geolocation database, or may be supplied by a mobile device in a log as activity is occurring.

The entity activity data (also known as account usage data) above can be used to track and determine an entity's physical movements. For example, the data could show that on Mar. 24, 2011, user ABCD accessed financial account 12345 using device XY and IP address 128.200.83.5. The IP used may correspond to an IP address associated with Venezuela as determined by an IP geolocation database. The same user ABCD may have then accessed financial account 12345 on Apr. 4, 2011 using his mobile device WZ and IP address 34.48.98.123. The mobile device may have used a banking application that could read the mobile device's GPS coordinates (as determined by the mobile device) at the time of access. This information may have been reported to the bank's servers and recorded in the log. Using a GPS coordinate database, the GPS coordinates may indicate that the device was in Mexico. On Apr. 7, 2011, user ABCD may have accessed the same financial account with device XY and an IP address affiliated with the United States. Using this information, the data analysis system can determine a path that the user is travelling. The data analysis system can group or sort access information by device, or user, etc. so that patterns of access can emerge for each device or user.

Such a path or pattern can be compared to money laundering indicators (such rules can also be referred to as money laundering profiles). For example, a rule may specify that users or devices traveling through explicitly risky countries or exhibiting a certain profile of behavior (moving across known drug trade/trafficking routes) can indicate a likelihood of money laundering activity associated financial accounts accessed by those users or devices. For example, one money laundering indicator could be satisfied if a user or device accessed a financial account from at least two risky jurisdictions. Another example money laundering indicator could require a pattern of movement to be satisfied.

For example, in FIG. 3A, Colombia 391, Venezuela 392, and Argentina 393 could comprise a first area of an indicator (1), Ecuador 395, Mexico 394, and the Bahamas 396 could comprise a second area of the indicator (2), and the United States 397 could comprise the last area in the indicator (3). The indicator could be satisfied if there was any movement detected (via the bank access) from the first area (1), then the second area (2), and then the third area (3), indicating a movement pattern that could be used by drug traffickers. Such a movement pattern could have a time requirement. For example, an indicator may be created that requires movement between the first area (1) and the second area (2) to occur within 3 days, and the movement between the second area (2) and the third area (3) to occur within 5 days of the first movement. Such an order of movement, or pattern of movement, is but one example of a money laundering indicator that could be created for the data analysis system to use in comparison to the activities of all the entities the data analysis system is aware of.

Each entity that matches such an indicator could then be used as a seed for a cluster. For example, if a device was an entity that exhibits such movement behaviors and satisfied the indicator, then a cluster can be built by the system that shows other related users that used the device, and the accounts those users accessed, and so on. Likewise, if a user satisfied the movement based rule, the user's profile can be used to determine the customers associated with the user, all other financial accounts of the user, and the user's personally identifiable information (phone numbers, addresses, etc.), each of which can be used again to further expand the cluster into additional cluster levels. The additional clustered information can be useful to find other devices, financial accounts, IP addresses, or users associated with money laundering, either automatically, or by additional analysis by an analyst.

FIG. 3B shows a table listing other example money laundering indicators, according to various embodiments of the present disclosure. Each of the listed example money laundering indicators, when applied to an entity data item and related data (for example, transaction information), may provide an indication of potential money laundering. For example, it may be known that device accesses of financial accounts from physical locations associated with drug trafficking may indicate money laundering potential. Accordingly, a money laundering indicator/rule may analyze the various entity data items to determine whether the devices or IP addresses entities match (for example) a location based money laundering indicator/rule. If so, the entities may have a higher likelihood of being associated with a money laundering, as may be indicated by the system.

Examples of money laundering indicators may include, but are not limited to:

-   -   Access to a financial account from a physical location         associated with drug trafficking, or from locations indicating a         known pattern of drug trafficking (e.g. Columbia, then Mexico,         then United States).     -   A device's fingerprint is known to be on a “bad fingerprint”         list, IP address on a “bad IP address” list, or a user is known         to be on a known money laundering user list.     -   A device or IP address logs into an account already known to be         affiliated with money laundering, or a user logs into an account         with a device or IP address already know to be affiliated with         money laundering.     -   A user, IP address, or device accessing the financial account         has a history of previous money laundering investigations.     -   A user, IP address, or device is involved in a transaction         matching money laundering criteria (e.g., threshold value of a         transaction, conversion to specific currency, etc.).     -   Accounts in which a particular dollar amount (or at least a         particular dollar amount) is sent from a credit line to a credit         union, or is a part of any financial transfer.     -   IP address or device used to access financial account is from a         known high-risk area for money laundering.     -   Financial account is accessed from multiple physical locations         within an unreasonable time period for travelling between         physical locations.     -   A single device accesses multiple financial accounts from         distinct physical locations from within a threshold time period.     -   A device accesses a financial account during a particular high         risk time period for money laundering.     -   A device, user, or IP address exceeds a threshold number of         failures at initiating a financial transaction for a financial         account.     -   A device's configuration change matches a pattern for money         laundering, such as whether a device's clock time frequently         changes between a Russian time zone and a US time zone, or the         device's geolocation as determined by GPS does not match the         device's IP address based geolocation.     -   The date and times of a device, user, or IP address's access of         a financial account matches a timing pattern indicating money         laundering.     -   Multiple transactions of a user, device, or IP, indicates a         charge for money laundering services. For example, multiple         financial accounts transferring 90% of an initial deposit         (indicating each account in a chain is keeping a 10% money         laundering fee).     -   A user accessing financial account is affiliated with a user         watch list.     -   A device or IP address accesses financial accounts away from         expected locations.     -   User accesses financial accounts away from known address of         user.     -   Device accessing financial account is known to be in a different         location than user affiliated with device or financial account.

As indicated by the example rules above, a variety of factors can be used in order to craft money laundering indicators (i.e. by requiring patterns, orders, threshold values of such factors, etc. that may indicate money laundering), including:

-   -   A device's, user's, and/or IP address's determined physical         location (e.g., in comparison to areas of higher or lower risk).     -   Distance of a device or user away from an expected or known         location, or any location affiliated with the entity (e.g. a         registered postal address)     -   Whether or not a device or user is operating within an expected         location of operations.     -   The destination or source of a money transfer     -   Historical trends of an entity, such as a significant change in         usual banking activity of a device or user     -   History of investigations of an entity     -   Blacklists of entities (no fly list, prohibited devices or IP         addresses, etc.)     -   A dollar value of a transaction or money transfer     -   A change or conversion of currency     -   Detectable percentage “cuts” (e.g. 10% of an amount) in one or         more transactions     -   The time a user or device accesses a financial institution from         a particular location (e.g., in comparison to the time required         to physically travel between two locations from which access         occurred)     -   Expected times of access from a physical location by a device or         user     -   Watch lists of known user accounts (e.g., terrorist or no-fly         watch list, etc.)     -   Changes in device identifiers, configuration, or networking         (e.g., fake reporting of OS/browser, using VPN's to disguise         network location or IP, etc.)     -   Failed attempts at a transaction (login, money transfer, etc.)     -   Types of transactions, or patterns thereof (ATM, wire, cash,         deposit, credit/debit)

Each of the above data and indicators, as well as many others, can also be used as, or to form, money laundering scoring criteria as well, as described herein.

In each of the money laundering indicators listed above (or any indicators than can be crafted with the data or data types listed above) or in other places in the present disclosure, the system may be configured to match at least one of the indicators, a particular number of the indicator, a particular subset of the indicators, and/or at least a particular number of the indicators. Further, one or more of the money laundering indicators, and/or aspect of the money laundering indicators, may be combined and/or rearranged to comprise other or additional money laundering indicators. For example, each of the money laundering indicators listed above that refer to a device may, in some embodiments, be used with IP addresses instead or in combination. Additionally, other money laundering indicators may include, for example, any of the cluster scoring criteria described below. Such money laundering indicators may be derived from automatic or manual analysis of entity activity data. For example, the data analysis system may automatically process large amounts of entity activity data across multiple years to determine various indicators and/or data items.

In each of the above listed examples, various items of information related to entities may be determined and/or accessed by the system. For example, device locations, IP address locations, users or accounts associated with each device/IP address, account numbers, transactions, numbers of transactions, transaction originators and/or recipients (including originating devices/IP addresses), and/or sums of transaction amounts (just to name a few) may be determined by the system and may be used to determine a likelihood of money laundering and/or scoring of particular entities (as described below in reference to FIG. 2B). In a particular example mentioned above, mobile devices associated with (and/or used to access) particular accounts may be determined. Each of the above mentioned data items (including mobile devices, IP addresses, phone numbers, accounts (and related data), persons, and the like) may be used in analyzing entity data items. In various embodiments, various money laundering indicators may be determined (for example, by empirical investigation and/or user preference) to be more or less important in determining a likelihood of money laundering (as described below).

In an embodiment, money laundering indicators may also be derived from known lists, and/or may be provided by an analyst or other user of the data analysis system. In various embodiments, money laundering indicators may be referred to as business rules. Further, in various embodiments and as described above, the money laundering indicators may define suspicious characteristics that may be associated with entity activity data items that may indicate possible money laundering.

Referring again to FIG. 2A, at blocks 218-222, the entity activity data may be analyzed in view of, or compared to, the various money laundering indicators, and particular entity activity data items that may be associated with money laundering are determined. For example, an entity data item may be associated with one or more of the indicators described above. In this example, the entity data item may be determined by the system possibly to be related to money laundering.

Specifically, as indicated by block 218, each of the money laundering indicators may be processed such that, at block 220, it may be determined whether each entity data item (and its associated data) satisfies the indicator/rule. For example, it may be determined whether a particular device is associated with multiple accounts, at least one of which is known for money laundering. At block 222, if all the money laundering indicators are not yet processed, the method continues back to block 218 to process additional indicators. However, if all indicators have been processed, the method continues to FIG. 2B. At this point, each of the money laundering indicators may have been processed, and the system may have determined, for each of the entity data items, whether the entity data item satisfies each of the indicators.

Money laundering indicators may be created via a feedback loop, either automatically, or with the assistance of an analyst. Initial money laundering indicators may be created or input in to the system by an analyst in order to initiate the data analysis system. Over time, the money laundering indicators may change or additional indicators may be incorporated into the system. These changes or additional indicators can be created either manually or automatically based on feedback.

In an embodiment that incorporates feedback manually, an analyst may create new indicators based on past investigations. For example, if the data analysis system indicates to the analyst over time, that many devices always tunnel their access through a Russian IP address prior to engaging in money laundering, then the analyst may create an additional indicator of money laundering that, where a device has a Russian IP address, the money laundering indicator would be satisfied. Through this type of feedback process, the data analysis system can improve or be changed to catch trends in money laundering activity.

In an embodiment that incorporates feedback automatically, new indicators may be created using automatic data modeling of the system via a feedback model. Such modeling techniques can take advantage of models known in the art, such as Bayesian filters, neural networks, etc. After an analyst completes an investigation, the analyst may indicate to the data analysis system a positive or negative result for various entities that informs the data analysis system whether or not entities were actually involved in money laundering. Using this feedback, the data analysis system can alter current indicators. For example, using a modeling technique, the data analysis system can alter its current indicators (e.g., raise or lower a threshold based on the data of known positive and negative money laundering, or add a location to a pattern of locations that indicate money laundering, add users or accounts to a black list based money laundering indicator etc.), or add a new money laundering indicator (e.g., determine a new physical location that is a high risk money laundering area, determine a new configuration change to a device that may indicate money laundering such as a specific browser plugin run by many devices involved in money laundering, etc.)

FIG. 2B is a flowchart of an example of an entity scoring method of the data analysis system, according to various embodiments of the present disclosure. The flowchart of FIG. 2B may correspond to block 174 of the flowchart of FIG. 1B. In general, one or more money laundering scoring criteria may be applied to the entity data items and one or more money laundering scores may be determined. Further, higher money laundering scores may generally indicate a higher likelihood that the particular entity is associated with or involved in money laundering, while a lower score may indicate a lower likelihood that the particular entity is associated with or involved in money laundering (or scoring order may be reversed in other embodiments).

At blocks 232-238, each of the entity data items may be analyzed to determine one or more money laundering scores. As mentioned above, at this point each of the money laundering indicators may have been processed and the system may have determined, for each of the entity data items, whether the entity data item satisfies each of the indicators (or some subset of the indicators that is applied to the entity data items). Accordingly, as indicated by block 232, each of the entity data items may be further processed to determine, at block 234, one or more money laundering scores based on money laundering scoring criteria and the satisfied money laundering indicators. In various embodiments, money laundering scoring criteria may include any of the money laundering indicators described above and/or any other scoring criteria and/or strategies described herein.

Examples of money laundering scoring criteria are shown at blocks 234 a-234 d. In various embodiments, money laundering scores may be aggregate scores and/or individual scores. For example, an aggregate score may be, for example, a score that takes into account multiple money laundering indicators, while individual scores may be, for example, scores that are based on a single money laundering indicator. The data analysis system may generate either or both types of scores, and/or combinations of the two types of scores. In various embodiments, aggregate scores may also be referred to as metascores.

An example of an aggregate money laundering score may be a score that is based on a sum of a number of money laundering indicators satisfied (234 a). For example, it may have been determined that a particular financial account entity satisfied or matched ten money laundering indicators. Accordingly, the account entity may be given an (aggregate) money laundering score of 10. An example of an individual score may be a score that comprises a raw value of a money laundering indicator (234 c). For example, it may have been determined that a particular device is associated with accessing a financial account from 5 jurisdictions that are associated with drug trafficking. Accordingly, the account entity may be given a money laundering score of 5.

Each entity may be given multiple money laundering scores, and each money laundering score may be an aggregate score, an individual score, and/or a combination of the two. In an embodiment, each money laundering score may include a raw value and/or a weighted value. For example, a given device entity may be associated with access from two jurisdictions associated with drug trafficking. As described above, when a scoring criterion is applied, the raw value of the score generated would be 2. However, the scoring criteria may also include a relative weighting that indicates the importance of the particular indicator and its associated score (“a number of accesses from illegal jurisdictions”) to an overall evaluation of the entity in the context of money laundering (234 b of FIG. 2B). Such a relative weighting, in one embodiment, may be a number between 0 and 1, where a number closer to one indicates a greater importance. For example, if the “number of illegal jurisdictions” score is considered relatively important to an evaluation of the entity, it may be given a relative weight of, for example, “0.7”. On the other hand, a relatively less important score/consideration (for example, the amount of bank transfers above $10,000 that an entity is involved with) may be given a relative weight of, for example, “0.15”. Then, when a score is calculated by the data analysis system, the relative weight may be multiplied by the raw value to arrive at a corresponding relative value. For example, in the “number of illegal jurisdictions” example, where a raw value of 2 was determined, a relative value may be determined by multiplying 2 by 0.7, to arrive at a relative value of 1.4.

In other embodiments, various other methods may be employed by the data analysis system to determine relative values of entity scores. For example, as may be recognized by one of skill in the art, the system may normalize the raw values of each of the scores before applying a relative weighting to arrive at a weighted value. Examples of entity scores presented to an analyst or other user of the data analysis system are shown and described below in reference to FIG. 4. In an embodiment, the importance of particular scores may be determined and/or based on empirical determinations and/or past entity evaluations. For example, over time the system and/or an analyst may determine, based on evaluations of entities by analysts, that particular scores are better indicators of money laundering than others. Such better indicators may be determined to be more important, and may therefore be weighted more heavily by the system. For example, some locations may be more heavily associated with money laundering (such as entities located in Russia or Colombia) and have higher relative scoring versus other locations that may be considered a weaker but still relevant indication of money laundering (Miami) and have lower relative scoring.

In an embodiment, at optional block 236, a money laundering metascore may be generated for the entity. The money laundering metascore may be based on a combination or aggregation of the individual scores generated in block 234. Alternatively, the metascores may be separately determined scores. In an embodiment, a metascore may be calculated by summing, multiplying, and/or otherwise aggregating or averaging the various individual or weighted scores together. The metascore may, in an embodiment, capture the relative importance of each of the individual scores by weighting each of the individual scores in a manner similar to that described above. In an embodiment, as also described above, the metascore may be a sum of the number of money laundering indicators satisfied by the entity.

At block 238, if all the entities are not yet processed, the method continues back to block 232 to process additional entities. However, if all entities have been processed, the method optionally continues to FIG. 2C. At this point, each of the entities may have been processed, and the system may have determined, for each of the entity data items, one or more entities scores that indicate a likelihood that the associated entity is involved in money laundering.

In various embodiments, metascores and/or scores may advantageously enable an analyst to directly compare and/or prioritize various entities, and/or may advantageously be used by the data analysis system to prioritize a list of entities and/or scores related to an entity. Examples of money laundering scores for entities presented to an analyst or other user of the data analysis system are shown and described below in reference to FIG. 4.

FIG. 2C is a flowchart of an example of a clustering method of the data analysis system, according to various embodiments of the present disclosure. The flowchart of FIG. 2C may correspond to block 178 of the flowchart of FIG. 1B. As described above, the clustering method shown in FIG. 2C is optional. In general, one or more one or more entity data items may be designated as “seeds,” and various related data items may be clustered with the seed. In an embodiment, as shown at block 242, the clustering method may be performed for entity data items with money laundering scores (and/or metascores) that satisfy one or more money laundering score thresholds. Alternatively, the clustering method may be performed for any or all entity data items. In an embodiment, user accounts, computing devices and IP addresses (and/or other entity activity data) exhibiting one or more of the above described criteria may be designated as seeds.

At blocks 242, 244, and 246, each of the determined entity data items, or seeds, may be clustered. Specifically, at block 244, any data items that are related to a particular seed are associated with a cluster that started with that seed. In an embodiment, clustering of data items may be accomplished as data bindings are executed as part of a clustering strategy. Clustered data items may be related by, for example, sharing the same or similar properties, characteristics, and/or metadata. Examples of data items that may be clustered include, but are not limited to: financial accounts or users of financial accounts (as described above, such as a username affiliated with a bank and one or more financial accounts at a bank), computing devices (e.g. computing device identifiers (mobile or otherwise), fingerprints, etc.), IP addresses and/or the like. Other entities may also be clustered in order to assist an analyst in their investigation. For example, in some embodiments, a history of money laundering investigations for a particular entity may be clustered together with the particular related entity.

As indicated by dashed line 245, the cluster generation method may optionally repeat multiple times until, for example, the clustering strategy is completed and/or no additional related data items are found by the system. Additionally, various clusters of data items may be collapsed or merged when common data items and/or properties are determined between the various clusters. Accordingly, in an embodiment the clustering method shown in FIG. 2C may iteratively cluster related data items.

In various embodiments, the clustered data items may include various properties and characteristics, including various entity related information. Accordingly, various information related to entities (devices, financial accounts, users, and IP addresses, and the like) may be included in the data cluster.

Additional details and examples regarding clustering of related data items are given in U.S. Provisional Patent Application No. 61/919,653 (the “'653 application”), filed Dec. 20, 2013, and U.S. patent application Ser. No. 13/968,213 (the “213 application”), filed Aug. 15, 2013, both of which are incorporated by reference herein in their entirety and for all purposes.

In an embodiment, a cluster graph may be generated and/or made available to an analyst or other user of the data analysis system. For example, an analyst may select a button (for example, an “Investigate in Graph” button) in a user interface of the system to view a cluster graph of a selected cluster. FIG. 3C illustrates an example of related money laundering entity data and/or growth of a cluster of related money laundering entity data in the form of a cluster graph, according to an embodiment of the present disclosure. In FIG. 3C, boxes indicate data items, while lines between boxes indicate links that connect data items. In the example of FIG. 3C, a seed computing device 304 (which may be a device ID, fingerprint, group of characteristics sufficient to identify a device, etc.) has been generated (such as by the process of FIG. 2A-2B). Then, in a clustering step corresponding to block 244 (of FIG. 2C) and represented by the inner cluster dashed line 306, various related data items 308, 310, 312, 314, and 316 are added to the cluster. Additionally, in a subsequent clustering step represented by the outer cluster dashed line 318, various additional data items 320 and 322 that are related to previously clustered data items are added to the cluster. In subsequent clustering steps additional data items may be added to the cluster as indicated by the line 326, 328, and 330.

Returning again to FIG. 2C, at block 246, if all the entities for clustering are not yet processed, the method continues back to block 242 to process additional entities. However, if all entities have been processed, the method optionally continues to FIG. 2D. At this point, each of the entities for clustering has been processed, and the system has determined, for each of the seed entity data items, a cluster of related data items.

FIG. 2D is a flowchart of an example of a cluster scoring method of the data analysis system, according to various embodiments of the present disclosure. The flowchart of FIG. 2D may correspond to block 180 of the flowchart of FIG. 1B. As described above, the clustering method shown in FIG. 2D is optional. In general, each of the generated clusters may be scored in a manner similar to the scoring of the entities described above in reference to FIG. 2B. In the flowchart of FIG. 2D, block 252 indicates that each of the following blocks (254, 256, and 258) may be performed for each of the clusters generated by the cluster generation method of FIG. 2C.

At block 254, the data analysis system may access and/or receive cluster scoring criteria. The cluster scoring criteria, as with the money laundering scoring criteria discussed above, may include any number of rules or scoring strategies such that multiple scores may be generated for each cluster. Cluster scoring criteria may include any of the money laundering scoring criteria discussed above, may include any of the money laundering indicators described above, and/or may include any other scoring criteria and/or strategies described herein.

At block 256, the cluster scoring criteria may be applied to the clusters and cluster scores may be determined. In an embodiment, each cluster score may include a raw value and/or a weighted value as described above in reference to FIG. 2B. Additionally, as described above, the system may normalize the raw values of each of the scores before applying a relative weighting to arrive at a weighted value.

At optional block 258, a metascore may be generated for the clusters. The cluster metascore may be based on a combination or aggregation of the individual scores generated in block 256. Alternatively, the metascores may be separately determined scores. In an embodiment, a metascore may be calculated by summing, multiplying, and/or otherwise aggregating or averaging the various individual scores together. The metascore may, in an embodiment, capture the relative importance of each of the individual cluster scores by weighting each of the individual scores in a manner similar to that described above with reference to FIG. 2B.

In various embodiments, cluster metascores and/or scores may advantageously enable an analyst to directly compare and/or prioritize various clusters, and/or may advantageously be used by the data analysis system to prioritize a list of clusters and/or scores related to a cluster.

Additional details and examples regarding scoring of clusters of related data items are given in the '653 application, which is incorporated by reference herein in its entirety and for all purposes.

c. Example User Interface of the Data Analysis System

FIG. 4 illustrates an example money laundering data analysis user interface of the data analysis system, according to an embodiment of the present disclosure. In various embodiments, the example user interface may be used individually for entities and clusters, and/or entities and clusters may both be displayed in the user interface. In either case, the principles described below apply. The example user interface of FIG. 4 includes a list of entities (e.g. computing devices, bank accounts, user accounts, IP addresses, etc.) or clusters 402, a list of scores 404, and a detailed view of a score 406. In various embodiments, more or fewer elements may be included in the user interface, and/or the elements may be arranged differently. As shown, the user interface of the FIG. 4 may include a list of entities/clusters in a first column, a list of scores associated with a selected entities/cluster in a middle column, and/or details associated with a selected score in a last column. Such an arrangement may advantageously enable an analyst to investigate various scores associated with an entity/cluster. Additionally, entities/clusters in such an interface may advantageously be prioritized according to any of multiple scores and/or metascores, as described above.

In the example user interface of FIG. 4, an analyst or user has selected “Entity 1” or Cluster 1.” Accordingly, various scores associated with that entity/cluster may be displayed in the list of scores 404. For example, scores are listed for “Location Patterns” (an indicator representing the number of matched patterns of physical location movements of the entity) and “Hi Risk Country Logins” (an indicator representing the number of logins to financial accounts from high risk countries) involving the entity/cluster, among others. Additionally, in the example user interface, the analyst has selected the “Previous Investigations” score. Accordingly, details related to that score may be displayed in the detailed view 406.

According to an embodiment, various items of information may be included in the user interface that may be useful to an analyst in evaluating and/or investigating the analyzed entities/generated clusters. For example, metascores associated with each of the entities/clusters may be shown in the list of entities/clusters 402, and/or the entities/clusters may be prioritized according to the metascores. In another example, raw values (408) and/or weighted values (410) may be displayed in the list of scores 404 for each score. In the example shown, a metascore of “0.6” may be calculated for “Entity/Cluster 1” by, for example, averaging the various cluster scores (for example, (0.9+0.8+0.4+0.3)/4=0.6). In another example, the detailed view 406 may include a graph that shows additional information related to the selected score. For example, in FIG. 4, the graph shown in the detailed view 406 shows a distribution of the number of investigations into the entity/cluster by month. In other embodiments, various other detailed information may be included in the user interface of FIG. 4.

In an embodiment, the data analysis system may, based on the entity activity data received, automatically evaluate the entity data items and/or generated clusters to determine a likelihood of an entity being involved in money laundering. For example, the system may determine that an entity or cluster having a metascore below a particular threshold is likely not involved in money laundering, while a cluster having a metascore above another particular threshold likely is somehow involved in money laundering. In an embodiment, the system may determine that an entity/cluster having a metascore within a particular range of thresholds requires additional analysis by an analyst as the likelihood of money laundering is not conclusive. In an embodiment, an analyst may adjust the thresholds, the metadata calculations, and/or the weighting applied to the scores. Further, the analyst may mark various entities/clusters as, for example, involved in money laundering, likely involved in money laundering, likely not involved in money laundering, and/or not involved in money laundering. Additionally, the analyst may dispatch other analysts to review particular entities/clusters and/or mark particular entities/clusters for further analysis.

In various embodiments, other aspects and/or advantage of the data analysis system may include: integration with structured data sources (for example, transactions, online access logs/web logs, mobile app information/data collected, account and customer details) that may be extremely large (for example, having millions, tens of millions, or billions of rows of data/data items)); analysis of such extremely large data sources (for example, millions, tens of millions, or billions of entity data items (for example, financial accounts, financial account users, devices and IP addresses accessing financial accounts, and related transactions) may be analyzed by the data analysis system in a short period of time (for example, on the order of seconds or minutes)); real-time analysis of such extremely large data sources; return of search results, entity scores/rankings, clusters, and/or the like on the order of seconds; automatic analysis or processing of entity data items (for example, on a schedule or in response to new data items being pushed to the system by a data source); mapping of data and scores to objects and links for the analyst without the analyst requiring any technical/SQL knowledge; enabling analysts to discover non-obvious and/or previously unknown links between various items of entity data; export functionality (for example, to CSV, PPT, PDF, and DOC formats) that allows users/analysts to collaborate with one-another as well as with external parties.

d. Example Implementation Mechanisms/Systems

FIG. 5 illustrates components of an illustrative server computing system 110, according to an embodiment. As described above in reference to FIG. 1, the server computing system 110 may comprise one or more computing devices that may perform a variety of tasks to implement the various operations of the data analysis system. As shown, the server computing system 110 may include, one or more central processing unit (CPU) 860, a network interface 850, a memory 820, and a storage 830 (for example, data store 140 of FIG. 1), each connected to an interconnect (bus) 840. The server computing system 110 may also include an I/O device interface 870 connecting I/O devices 875 (for example, keyboard, display, mouse, and/or other input/output devices) to the computing system 110. Further, in context of this disclosure, the computing elements shown in server computing system 110 may correspond to a physical computing system (for example, a system in a data center, a computer server, a desktop computer, a laptop computer, and/or the like) and/or may be a virtual computing instance executing within a hosted computing environment.

The CPU 860 may retrieve and execute programming instructions stored in memory 820, as well as store and retrieve application data residing in memory 820. The bus 840 may be used to transmit programming instructions and application data between the CPU 860, I/O device interface 870, storage 830, network interface 850, and memory 820. Note that the CPU 860 is included to be representative of, for example, a single CPU, multiple CPUs, a single CPU having multiple processing cores, a CPU with an associate memory management unit, and the like.

The memory 820 is included to be representative of, for example, a random access memory (RAM), cache and/or other dynamic storage devices for storing information and instructions to be executed by CPU 860. Memory 820 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by CPU 860. Such instructions, when stored in storage media accessible to CPU 860, render server computing system 110 into a special-purpose machine that is customized to perform the operations specified in the instructions.

The storage 830 may be a disk drive storage device, a read only memory (ROM), or other static, non-transitory, and/or computer-readable storage device or medium coupled to bus 840 for storing static information and instructions for CPU 860. Although shown as a single unit, the storage 830 may be a combination of fixed and/or removable storage devices, such as fixed disc drives, removable memory cards, and/or optical storage, network attached storage (NAS), and/or a storage area-network (SAN).

Programming instructions, such as the rules engine 120 and/or the workflow engine 125, may be stored in the memory 820 and/or storage 830 in various software modules. The modules may be stored in a mass storage device (such as storage 830) as executable software codes that are executed by the server computing system 110. These and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

According to an embodiment, the storage 830 may include various money laundering indicators 520, various money laundering scoring criteria 522, various cluster strategies 524, various cluster scoring criteria 526, and/or the like. As described above, the money laundering indicators 520 may be used in determining a likelihood that an entity data item is involved in money laundering, the money laundering scoring criteria 522 may be used in scoring entity data items, the cluster strategies 524 may be used in generating clusters of related data items, and the cluster scoring criteria 526 may be used in scoring clusters.

Further, according to an embodiment, the memory 820 stores an entity list 530, a cluster list 532, the rules engine 120, and the workflow engine 125 (as described with reference to the various figures above). The rules engine 120 may execute the money laundering indicators 520, the money laundering criteria 522, the cluster strategies 524, the cluster scoring criteria 526, and/or the like, as described above in references to the various figure. The entity list 530 may be, for example, a list of computing devices, financial accounts, and/or IP addresses to which the money laundering indicators and/or money laundering scoring criteria are applied. The entity list 520 may further be accessed by the workflow engine 125 for presentation in a ranked order to a user via a user interface. Similarly, the cluster list 532 may be, for example, a list of cluster of related data items determined by the system based on various cluster strategies and to which the cluster scoring criteria are applied.

Although shown in memory 820, the entity list 530, rules engine 120, cluster list 532, and/or workflow engine 125 may be stored in memory 820, storage 830, and/or split between memory 820 and storage 830. Likewise, the various money laundering indicators 520, money laundering scoring criteria 522, cluster strategies 524, and/or cluster scoring criteria 526 may be stored in memory 820, storage 830, and/or split between memory 820 and storage 830.

The network 150 may be any wired network, wireless network, or combination thereof. In addition, the network 150 may be a personal area network, local area network, wide area network, cable network, satellite network, cellular telephone network, or combination thereof. Protocols and components for communicating via the Internet or any of the other aforementioned types of communication networks are well known to those skilled in the art of computer communications and thus, need not be described in more detail herein.

As described above in reference to FIG. 1, the server computing system 110 may be in communication with one or more data sources 160. Communication between the server computing system 110 and the data sources 160 may be via the network 150 and/or direct, as shown by the solid and dashed lines. In an embodiment, an optional data aggregator/formatter device and/or system 502 may aggregate various data from multiple data sources and/or may format the data such that it may be received by the server computing system 110 in a standardized and/or readable format. For example, when multiple data sources contain and/or provide data in various formats, the data aggregator/formatter may convert all the data into a similar format. Accordingly, in an embodiment the system may receive and/or access money laundering entity data from, or via, a device or system such as the data aggregator/formatter 502.

As also described above, in various embodiments the system may be accessible by an analyst (or other operator or user) through a web-based viewer, such as a web browser. In this embodiment, the user interface may be generated by the server computing system 110 and transmitted to the web browser of the analyst. Alternatively, data necessary for generating the user interface may be provided by the server computing system 110 to the browser, where the user interface may be generated. The analyst/user may then interact with the user interface through the web-browser. In an embodiment, the user interface of the data analysis system may be accessible through a dedicated software application. In an embodiment, the client computing device 130 may be a mobile computing device, and the user interface of the data analysis system may be accessible through such a mobile computing device (for example, a smartphone and/or tablet). In this embodiment, the server computing system 110 may generate and transmit a user interface to the mobile computing device. Alternatively, the mobile computing device may include modules for generating the user interface, and the server computing system 110 may provide user interaction data to the mobile computing device. In an embodiment, the server computing system 110 comprises a mobile computing device. Additionally, in various embodiments any of the components and/or functionality described above with reference to the server computing system 110 (including, for example, memory, storage, CPU, network interface, I/O device interface, and the like), and/or similar or corresponding components and/or functionality, may be included in the client computing device 130.

According to various embodiments, the data analysis system and various methods and techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, server computer systems, portable computer systems, handheld devices, networking devices or any other device or combination of devices that incorporate hard-wired and/or program logic to implement the techniques.

Computing devices of the data analysis system may generally be controlled and/or coordinated by operating system software, such as iOS, Android, Chrome OS, Windows XP, Windows Vista, Windows 7, Windows 8, Windows Server, Windows CE, Unix, Linux, SunOS, Solaris, iOS, Blackberry OS, VxWorks, or other compatible operating systems. In other embodiments, the computing devices may be controlled by a proprietary operating system. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, I/O services, and provide a user interface functionality, such as a graphical user interface (“GUI”), among other things.

In general, the word “module,” as used herein, refers to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, Lua, C or C++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware devices (such as processors and CPUs) may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules or computing device functionality described herein are preferably implemented as software modules, but may be represented in hardware devices. Generally, the modules described herein refer to software modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage.

Server computing system 110 may implement various techniques and methods described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which, in combination with various software modules, causes the server computing system 110 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by server computing system 110 in response to CPU 860 executing one or more sequences of one or more modules and/or instructions contained in memory 820. Such instructions may be read into memory 820 from another storage medium, such as storage 830. Execution of the sequences of instructions contained in memory 820 may cause CPU 840 to perform the processes and methods described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term “non-transitory media,” and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage 830. Volatile media includes dynamic memory, such as memory 820. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.

Non-transitory media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between non-transitory media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 840. Transmission media may also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to CPU 860 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer may load the instructions and/or modules into its dynamic memory and send the instructions over a telephone or cable line using a modem. A modem local to server computing system 820 may receive the data on the telephone/cable line and use a converter device including the appropriate circuitry to place the data on bus 840. Bus 840 carries the data to memory 820, from which CPU 860 retrieves and executes the instructions. The instructions received by memory 820 may optionally be stored on storage 830 either before or after execution by CPU 860.

V. Additional Embodiments

While the foregoing is directed to various embodiments, other and further embodiments may be devised without departing from the basic scope thereof. For example, aspects of the present disclosure may be implemented in hardware or software or in a combination of hardware and software. An embodiment of the disclosure may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and may be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The processes and algorithms may alternatively be implemented partially or wholly in application-specific circuitry.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.

Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

The term “comprising” as used herein should be given an inclusive rather than exclusive interpretation. For example, a general purpose computer comprising one or more processors should not be interpreted as excluding other computer components, and may possibly include such components as memory, input/output devices, and/or network interfaces, among others.

Any process descriptions, elements, or blocks in the flow diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.

It should be emphasized that many variations and modifications may be made to the above-described embodiments, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure. The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention may be practiced in many ways. As is also stated above, it should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated. The scope of the invention should therefore be construed in accordance with the appended claims and any equivalents thereof. 

1. (canceled)
 2. A method comprising: by a computer system comprising one or more computer hardware processors and one or more storage devices, identifying, from a plurality of entity data items, a first entity data item that corresponds to at least one of: an entity from a predetermined list of entities that have been identified as being associated with potential money laundering activities, an entity associated with a high-risk area for money laundering, or an entity associated with a transaction indicative of a potential money laundering fee; determining, from a plurality of Internet Protocol address data items, a first Internet Protocol address associated with the first entity data item, wherein a computing device associated with the first entity data item was assigned the first Internet Protocol address; determining a first account that was accessed by the first Internet Protocol address, the first account different than the first entity data item; designating the first account as a seed; identifying one or more related data items associated with the seed; generating a cluster based at least on the seed, wherein generating the cluster comprises: adding the seed to the cluster; and adding the one or more related data items to the cluster; and causing presentation, in a user interface, of at least some data from the cluster.
 3. The method of claim 2, further comprising: determining a number of occurrences where the first account was accessed by the first Internet Protocol address.
 4. The method of claim 3, wherein designating the first account as the seed further comprises determining that the number of occurrences where the first account was accessed by the first Internet Protocol address exceeds a threshold.
 5. The method of claim 3, wherein generating the cluster further comprises: adding corresponding one or more data items to the cluster for each occurrence where the first account was accessed by the first Internet Protocol address.
 6. The method of claim 5, further comprising: causing presentation, in the user interface, of the cluster as a visual graph comprising a representation of the seed as linked to the one or more related data items and linked to the corresponding one or more data items for each occurrence where the first account was accessed by the first Internet Protocol address.
 7. The method of claim 5, further comprising: accessing, from log data, a log record comprising a first account identifier for the first account and a computing device identifier; and generating a device data item corresponding to the computer device identifier, wherein identifying the one or more related data items associated with the seed further comprises: determining that the device data item is related to the first account based at least in part on the log record comprising the first account identifier and the computing device identifier.
 8. A non-transitory computer storage medium storing computer executable instructions that when executed by a computer hardware processor perform operations comprising: identifying, from a plurality of entity data items, a first entity data item that corresponds to at least one of: an entity from a predetermined list of entities that have been identified as being associated with potential money laundering activities, an entity associated with a high-risk area for money laundering, or an entity associated with a transaction indicative of a potential money laundering fee; determining a first Internet Protocol address associated with the first entity data item, wherein a computing device associated with the first entity data item was assigned the first Internet Protocol address; determining a first account that was accessed by the first Internet Protocol address, the first account different than the first entity data item; designating the first account as a seed; identifying one or more related data items associated with the seed; generating a cluster based at least on the seed, wherein generating the cluster comprises: adding the seed to the cluster; and adding the one or more related data items to the cluster; and causing presentation, in a user interface, of at least some data from the cluster.
 9. The non-transitory computer storage medium of claim 8, wherein the operations further comprise: determining a number of occurrences where the first account was accessed by the first Internet Protocol address.
 10. The non-transitory computer storage medium of claim 9, wherein designating the first account as the seed further comprises determining that the number of occurrences where the first account was accessed by the first Internet Protocol address exceeds a threshold.
 11. The non-transitory computer storage medium of claim 9, wherein generating the cluster further comprises: adding corresponding one or more data items to the cluster for each occurrence where the first account was accessed by the first Internet Protocol address.
 12. The non-transitory computer storage medium of claim 11, wherein the operations further comprise: causing presentation, in the user interface, of the cluster as a visual graph comprising a representation of the seed as linked to the corresponding one or more data items for each occurrence where the first account was accessed by the first Internet Protocol address.
 13. The non-transitory computer storage medium of claim 8, wherein the operations further comprise: accessing, from log data, a log record comprising a first account identifier for the first account and an additional identifier; and generating an additional data item corresponding to the additional identifier, wherein identifying the one or more related data items associated with the seed further comprises: determining that the additional data item is related to the first account based at least in part on the log record comprising the first account identifier and the additional identifier.
 14. The non-transitory computer storage medium of claim 8, wherein the additional identifier corresponds to at least one of: a user identifier, a computing device identifier, a cookie identifier, a hash identifier.
 15. A system comprising: a non-transitory computer-readable storage medium configured to store: a plurality of entity data items; and a plurality of Internet Protocol address data items; and a computer hardware processor in communication with the non-transitory computer-readable storage medium that executes computer executable instructions to: identify, from the plurality of entity data items, a first entity data item that corresponds to at least one of: an entity from a predetermined list of entities that have been identified as being associated with potential money laundering activities, an entity associated with a high-risk area for money laundering, or an entity associated with a transaction indicative of a potential money laundering fee; determine, from the plurality of Internet Protocol address data items, a first Internet Protocol address associated with the first entity data item, wherein a computing device associated with the first entity data item was assigned the first Internet Protocol address; determine a first account that was accessed by the first Internet Protocol address, the first account different than the first entity data item; designate the first account as a seed; identify one or more related data items associated with the seed; and generate a cluster based at least on the seed, wherein generating the cluster comprises: adding the seed to the cluster; and adding the one or more related data items to the cluster.
 16. The system of claim 15, wherein the entity data item comprises at least one of: an account data item, a user data item, or an organization data item.
 17. The system of claim 15, wherein the computer hardware processor further executes the computer executable instructions to: determine a number of occurrences where the first account was accessed by the first Internet Protocol address.
 18. The system of claim 17, wherein designating the first account as the seed further comprises determining that the number of occurrences where the first account was accessed by the first Internet Protocol address exceeds a threshold.
 19. The system of claim 17, wherein generating the cluster further comprises: adding corresponding one or more data items to the cluster for each occurrence where the first account was accessed by the first Internet Protocol address.
 20. The system of claim 19, wherein the computer hardware processor further executes the computer executable instructions to: causing presentation, in a user interface, of the cluster as a visual graph comprising a representation of the seed as linked to the corresponding one or more data items for each occurrence where the first account was accessed by the first Internet Protocol address.
 21. The system of claim 15, wherein the computer hardware processor further executes the computer executable instructions to: access, from log data, a log record comprising a first account identifier for the first account and an additional identifier; and generate an additional data item corresponding to the additional identifier, wherein identifying the one or more related data items associated with the seed further comprises: determine that the additional data item is related to the first account based at least in part on the log record comprising the first account identifier and the additional identifier. 